Goal/Purpose of operations:
Checking GTEx samples for cofounders and exploratory data analysis
Finished psedocode on:
210806
System which operations were done on:
Jen lab mac
GitHub Repo:
230321_JLF_Sex_bias_adverse_events
Directory of operations:
/home/rstudio
Scripts being edited for operations:
NA
Docker:
jenfisher7/rstudio_sex_bias_drugs
Data being used:
GTEx
GTEx_Analysis_v8_Annotations_SubjectPhenotypesDS.txt
GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt
downloaded from https://gtexportal.org/home/datasets on 210517
Papers and tools:
Main- ggplot2 and tidyverse packages
additional tools starting at line 59.
setwd("/home/rstudio")
library(recount3)
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
## table, tapply, union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
library(rlang)
##
## Attaching package: 'rlang'
## The following object is masked from 'package:Biobase':
##
## exprs
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:Biobase':
##
## combine
## The following objects are masked from 'package:GenomicRanges':
##
## intersect, setdiff, union
## The following object is masked from 'package:GenomeInfoDb':
##
## intersect
## The following objects are masked from 'package:IRanges':
##
## collapse, desc, intersect, setdiff, slice, union
## The following objects are masked from 'package:S4Vectors':
##
## first, intersect, rename, setdiff, setequal, union
## The following objects are masked from 'package:BiocGenerics':
##
## combine, intersect, setdiff, union
## The following object is masked from 'package:matrixStats':
##
## count
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(dbplyr)
##
## Attaching package: 'dbplyr'
## The following objects are masked from 'package:dplyr':
##
## ident, sql
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.0
## ✔ tibble 3.1.8 ✔ stringr 1.5.0
## ✔ tidyr 1.2.1 ✔ forcats 0.5.2
## ✔ readr 2.1.3
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ purrr::%@%() masks rlang::%@%()
## ✖ dplyr::collapse() masks IRanges::collapse()
## ✖ dplyr::combine() masks Biobase::combine(), BiocGenerics::combine()
## ✖ dplyr::count() masks matrixStats::count()
## ✖ dplyr::desc() masks IRanges::desc()
## ✖ tidyr::expand() masks S4Vectors::expand()
## ✖ rlang::exprs() masks Biobase::exprs()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::first() masks S4Vectors::first()
## ✖ purrr::flatten() masks rlang::flatten()
## ✖ purrr::flatten_chr() masks rlang::flatten_chr()
## ✖ purrr::flatten_dbl() masks rlang::flatten_dbl()
## ✖ purrr::flatten_int() masks rlang::flatten_int()
## ✖ purrr::flatten_lgl() masks rlang::flatten_lgl()
## ✖ purrr::flatten_raw() masks rlang::flatten_raw()
## ✖ dbplyr::ident() masks dplyr::ident()
## ✖ purrr::invoke() masks rlang::invoke()
## ✖ dplyr::lag() masks stats::lag()
## ✖ ggplot2::Position() masks BiocGenerics::Position(), base::Position()
## ✖ purrr::reduce() masks GenomicRanges::reduce(), IRanges::reduce()
## ✖ dplyr::rename() masks S4Vectors::rename()
## ✖ dplyr::slice() masks IRanges::slice()
## ✖ purrr::splice() masks rlang::splice()
## ✖ dbplyr::sql() masks dplyr::sql()
library(ggplot2)
library(viridis)
## Loading required package: viridisLite
library(RColorBrewer)
library(DESeq2)
get Gtex data from recount3
human_projects <- readRDS("~/data/human_recount3_projects.rds")
gtex_proj_info <- subset(human_projects, file_source == "gtex")
for (i in seq_len(nrow(gtex_proj_info))) {
name <- paste(gtex_proj_info[i, 1], "rse", sep = "_")
assign(name, create_rse(gtex_proj_info[i, ]))
}
## 2023-05-02 18:10:32 downloading and reading the metadata.
## 2023-05-02 18:10:33 caching file gtex.gtex.ADIPOSE_TISSUE.MD.gz.
## 2023-05-02 18:10:33 caching file gtex.recount_project.ADIPOSE_TISSUE.MD.gz.
## 2023-05-02 18:10:34 caching file gtex.recount_qc.ADIPOSE_TISSUE.MD.gz.
## 2023-05-02 18:10:35 caching file gtex.recount_seq_qc.ADIPOSE_TISSUE.MD.gz.
## 2023-05-02 18:10:36 downloading and reading the feature information.
## 2023-05-02 18:10:37 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:10:37 downloading and reading the counts: 1293 samples across 63856 features.
## 2023-05-02 18:10:38 caching file gtex.gene_sums.ADIPOSE_TISSUE.G026.gz.
## 2023-05-02 18:10:49 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:10:49 downloading and reading the metadata.
## 2023-05-02 18:10:50 caching file gtex.gtex.MUSCLE.MD.gz.
## 2023-05-02 18:10:51 caching file gtex.recount_project.MUSCLE.MD.gz.
## 2023-05-02 18:10:51 caching file gtex.recount_qc.MUSCLE.MD.gz.
## 2023-05-02 18:10:52 caching file gtex.recount_seq_qc.MUSCLE.MD.gz.
## 2023-05-02 18:10:52 downloading and reading the feature information.
## 2023-05-02 18:10:53 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:10:54 downloading and reading the counts: 881 samples across 63856 features.
## 2023-05-02 18:10:54 caching file gtex.gene_sums.MUSCLE.G026.gz.
## 2023-05-02 18:10:58 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:10:58 downloading and reading the metadata.
## 2023-05-02 18:10:58 caching file gtex.gtex.BLOOD_VESSEL.MD.gz.
## 2023-05-02 18:10:59 caching file gtex.recount_project.BLOOD_VESSEL.MD.gz.
## 2023-05-02 18:11:00 caching file gtex.recount_qc.BLOOD_VESSEL.MD.gz.
## 2023-05-02 18:11:00 caching file gtex.recount_seq_qc.BLOOD_VESSEL.MD.gz.
## 2023-05-02 18:11:01 downloading and reading the feature information.
## 2023-05-02 18:11:02 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:11:02 downloading and reading the counts: 1398 samples across 63856 features.
## 2023-05-02 18:11:03 caching file gtex.gene_sums.BLOOD_VESSEL.G026.gz.
## 2023-05-02 18:11:09 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:11:09 downloading and reading the metadata.
## 2023-05-02 18:11:10 caching file gtex.gtex.HEART.MD.gz.
## 2023-05-02 18:11:10 caching file gtex.recount_project.HEART.MD.gz.
## 2023-05-02 18:11:11 caching file gtex.recount_qc.HEART.MD.gz.
## 2023-05-02 18:11:12 caching file gtex.recount_seq_qc.HEART.MD.gz.
## 2023-05-02 18:11:12 downloading and reading the feature information.
## 2023-05-02 18:11:13 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:11:13 downloading and reading the counts: 942 samples across 63856 features.
## 2023-05-02 18:11:14 caching file gtex.gene_sums.HEART.G026.gz.
## 2023-05-02 18:11:18 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:11:18 downloading and reading the metadata.
## 2023-05-02 18:11:19 caching file gtex.gtex.OVARY.MD.gz.
## 2023-05-02 18:11:19 caching file gtex.recount_project.OVARY.MD.gz.
## 2023-05-02 18:11:20 caching file gtex.recount_qc.OVARY.MD.gz.
## 2023-05-02 18:11:21 caching file gtex.recount_seq_qc.OVARY.MD.gz.
## 2023-05-02 18:11:22 downloading and reading the feature information.
## 2023-05-02 18:11:22 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:11:23 downloading and reading the counts: 195 samples across 63856 features.
## 2023-05-02 18:11:23 caching file gtex.gene_sums.OVARY.G026.gz.
## 2023-05-02 18:11:24 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:11:24 downloading and reading the metadata.
## 2023-05-02 18:11:24 caching file gtex.gtex.UTERUS.MD.gz.
## 2023-05-02 18:11:25 caching file gtex.recount_project.UTERUS.MD.gz.
## 2023-05-02 18:11:26 caching file gtex.recount_qc.UTERUS.MD.gz.
## 2023-05-02 18:11:26 caching file gtex.recount_seq_qc.UTERUS.MD.gz.
## 2023-05-02 18:11:27 downloading and reading the feature information.
## 2023-05-02 18:11:28 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:11:28 downloading and reading the counts: 159 samples across 63856 features.
## 2023-05-02 18:11:29 caching file gtex.gene_sums.UTERUS.G026.gz.
## 2023-05-02 18:11:29 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:11:29 downloading and reading the metadata.
## 2023-05-02 18:11:30 caching file gtex.gtex.VAGINA.MD.gz.
## 2023-05-02 18:11:31 caching file gtex.recount_project.VAGINA.MD.gz.
## 2023-05-02 18:11:31 caching file gtex.recount_qc.VAGINA.MD.gz.
## 2023-05-02 18:11:32 caching file gtex.recount_seq_qc.VAGINA.MD.gz.
## 2023-05-02 18:11:33 downloading and reading the feature information.
## 2023-05-02 18:11:33 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:11:34 downloading and reading the counts: 173 samples across 63856 features.
## 2023-05-02 18:11:34 caching file gtex.gene_sums.VAGINA.G026.gz.
## 2023-05-02 18:11:35 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:11:35 downloading and reading the metadata.
## 2023-05-02 18:11:35 caching file gtex.gtex.BREAST.MD.gz.
## 2023-05-02 18:11:36 caching file gtex.recount_project.BREAST.MD.gz.
## 2023-05-02 18:11:37 caching file gtex.recount_qc.BREAST.MD.gz.
## 2023-05-02 18:11:37 caching file gtex.recount_seq_qc.BREAST.MD.gz.
## 2023-05-02 18:11:38 downloading and reading the feature information.
## 2023-05-02 18:11:38 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:11:39 downloading and reading the counts: 482 samples across 63856 features.
## 2023-05-02 18:11:40 caching file gtex.gene_sums.BREAST.G026.gz.
## 2023-05-02 18:11:42 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:11:42 downloading and reading the metadata.
## 2023-05-02 18:11:42 caching file gtex.gtex.SKIN.MD.gz.
## 2023-05-02 18:11:43 caching file gtex.recount_project.SKIN.MD.gz.
## 2023-05-02 18:11:44 caching file gtex.recount_qc.SKIN.MD.gz.
## 2023-05-02 18:11:44 caching file gtex.recount_seq_qc.SKIN.MD.gz.
## 2023-05-02 18:11:45 downloading and reading the feature information.
## 2023-05-02 18:11:45 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:11:46 downloading and reading the counts: 1940 samples across 63856 features.
## 2023-05-02 18:11:46 caching file gtex.gene_sums.SKIN.G026.gz.
## 2023-05-02 18:11:57 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:11:57 downloading and reading the metadata.
## 2023-05-02 18:11:58 caching file gtex.gtex.SALIVARY_GLAND.MD.gz.
## 2023-05-02 18:11:58 caching file gtex.recount_project.SALIVARY_GLAND.MD.gz.
## 2023-05-02 18:11:59 caching file gtex.recount_qc.SALIVARY_GLAND.MD.gz.
## 2023-05-02 18:12:00 caching file gtex.recount_seq_qc.SALIVARY_GLAND.MD.gz.
## 2023-05-02 18:12:00 downloading and reading the feature information.
## 2023-05-02 18:12:01 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:12:01 downloading and reading the counts: 178 samples across 63856 features.
## 2023-05-02 18:12:02 caching file gtex.gene_sums.SALIVARY_GLAND.G026.gz.
## 2023-05-02 18:12:03 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:12:03 downloading and reading the metadata.
## 2023-05-02 18:12:03 caching file gtex.gtex.BRAIN.MD.gz.
## 2023-05-02 18:12:04 caching file gtex.recount_project.BRAIN.MD.gz.
## 2023-05-02 18:12:05 caching file gtex.recount_qc.BRAIN.MD.gz.
## 2023-05-02 18:12:06 caching file gtex.recount_seq_qc.BRAIN.MD.gz.
## 2023-05-02 18:12:06 downloading and reading the feature information.
## 2023-05-02 18:12:07 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:12:08 downloading and reading the counts: 2931 samples across 63856 features.
## 2023-05-02 18:12:09 caching file gtex.gene_sums.BRAIN.G026.gz.
## 2023-05-02 18:12:26 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:12:26 downloading and reading the metadata.
## 2023-05-02 18:12:26 caching file gtex.gtex.ADRENAL_GLAND.MD.gz.
## 2023-05-02 18:12:27 caching file gtex.recount_project.ADRENAL_GLAND.MD.gz.
## 2023-05-02 18:12:28 caching file gtex.recount_qc.ADRENAL_GLAND.MD.gz.
## 2023-05-02 18:12:29 caching file gtex.recount_seq_qc.ADRENAL_GLAND.MD.gz.
## 2023-05-02 18:12:29 downloading and reading the feature information.
## 2023-05-02 18:12:29 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:12:30 downloading and reading the counts: 274 samples across 63856 features.
## 2023-05-02 18:12:31 caching file gtex.gene_sums.ADRENAL_GLAND.G026.gz.
## 2023-05-02 18:12:32 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:12:32 downloading and reading the metadata.
## 2023-05-02 18:12:33 caching file gtex.gtex.THYROID.MD.gz.
## 2023-05-02 18:12:34 caching file gtex.recount_project.THYROID.MD.gz.
## 2023-05-02 18:12:35 caching file gtex.recount_qc.THYROID.MD.gz.
## 2023-05-02 18:12:36 caching file gtex.recount_seq_qc.THYROID.MD.gz.
## 2023-05-02 18:12:36 downloading and reading the feature information.
## 2023-05-02 18:12:37 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:12:37 downloading and reading the counts: 706 samples across 63856 features.
## 2023-05-02 18:12:38 caching file gtex.gene_sums.THYROID.G026.gz.
## 2023-05-02 18:12:40 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:12:40 downloading and reading the metadata.
## 2023-05-02 18:12:41 caching file gtex.gtex.LUNG.MD.gz.
## 2023-05-02 18:12:42 caching file gtex.recount_project.LUNG.MD.gz.
## 2023-05-02 18:12:43 caching file gtex.recount_qc.LUNG.MD.gz.
## 2023-05-02 18:12:43 caching file gtex.recount_seq_qc.LUNG.MD.gz.
## 2023-05-02 18:12:44 downloading and reading the feature information.
## 2023-05-02 18:12:44 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:12:45 downloading and reading the counts: 655 samples across 63856 features.
## 2023-05-02 18:12:46 caching file gtex.gene_sums.LUNG.G026.gz.
## 2023-05-02 18:12:48 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:12:48 downloading and reading the metadata.
## 2023-05-02 18:12:48 caching file gtex.gtex.SPLEEN.MD.gz.
## 2023-05-02 18:12:49 caching file gtex.recount_project.SPLEEN.MD.gz.
## 2023-05-02 18:12:50 caching file gtex.recount_qc.SPLEEN.MD.gz.
## 2023-05-02 18:12:50 caching file gtex.recount_seq_qc.SPLEEN.MD.gz.
## 2023-05-02 18:12:51 downloading and reading the feature information.
## 2023-05-02 18:12:51 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:12:52 downloading and reading the counts: 255 samples across 63856 features.
## 2023-05-02 18:12:52 caching file gtex.gene_sums.SPLEEN.G026.gz.
## 2023-05-02 18:12:53 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:12:53 downloading and reading the metadata.
## 2023-05-02 18:12:54 caching file gtex.gtex.PANCREAS.MD.gz.
## 2023-05-02 18:12:54 caching file gtex.recount_project.PANCREAS.MD.gz.
## 2023-05-02 18:12:55 caching file gtex.recount_qc.PANCREAS.MD.gz.
## 2023-05-02 18:12:56 caching file gtex.recount_seq_qc.PANCREAS.MD.gz.
## 2023-05-02 18:12:56 downloading and reading the feature information.
## 2023-05-02 18:12:57 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:12:57 downloading and reading the counts: 360 samples across 63856 features.
## 2023-05-02 18:12:58 caching file gtex.gene_sums.PANCREAS.G026.gz.
## 2023-05-02 18:12:59 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:12:59 downloading and reading the metadata.
## 2023-05-02 18:13:00 caching file gtex.gtex.ESOPHAGUS.MD.gz.
## 2023-05-02 18:13:00 caching file gtex.recount_project.ESOPHAGUS.MD.gz.
## 2023-05-02 18:13:01 caching file gtex.recount_qc.ESOPHAGUS.MD.gz.
## 2023-05-02 18:13:02 caching file gtex.recount_seq_qc.ESOPHAGUS.MD.gz.
## 2023-05-02 18:13:02 downloading and reading the feature information.
## 2023-05-02 18:13:03 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:13:03 downloading and reading the counts: 1577 samples across 63856 features.
## 2023-05-02 18:13:04 caching file gtex.gene_sums.ESOPHAGUS.G026.gz.
## 2023-05-02 18:13:09 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:13:09 downloading and reading the metadata.
## 2023-05-02 18:13:10 caching file gtex.gtex.STOMACH.MD.gz.
## 2023-05-02 18:13:11 caching file gtex.recount_project.STOMACH.MD.gz.
## 2023-05-02 18:13:11 caching file gtex.recount_qc.STOMACH.MD.gz.
## 2023-05-02 18:13:12 caching file gtex.recount_seq_qc.STOMACH.MD.gz.
## 2023-05-02 18:13:13 downloading and reading the feature information.
## 2023-05-02 18:13:13 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:13:14 downloading and reading the counts: 384 samples across 63856 features.
## 2023-05-02 18:13:14 caching file gtex.gene_sums.STOMACH.G026.gz.
## 2023-05-02 18:13:16 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:13:16 downloading and reading the metadata.
## 2023-05-02 18:13:16 caching file gtex.gtex.COLON.MD.gz.
## 2023-05-02 18:13:17 caching file gtex.recount_project.COLON.MD.gz.
## 2023-05-02 18:13:18 caching file gtex.recount_qc.COLON.MD.gz.
## 2023-05-02 18:13:19 caching file gtex.recount_seq_qc.COLON.MD.gz.
## 2023-05-02 18:13:19 downloading and reading the feature information.
## 2023-05-02 18:13:20 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:13:21 downloading and reading the counts: 822 samples across 63856 features.
## 2023-05-02 18:13:21 caching file gtex.gene_sums.COLON.G026.gz.
## 2023-05-02 18:13:24 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:13:24 downloading and reading the metadata.
## 2023-05-02 18:13:25 caching file gtex.gtex.SMALL_INTESTINE.MD.gz.
## 2023-05-02 18:13:26 caching file gtex.recount_project.SMALL_INTESTINE.MD.gz.
## 2023-05-02 18:13:26 caching file gtex.recount_qc.SMALL_INTESTINE.MD.gz.
## 2023-05-02 18:13:27 caching file gtex.recount_seq_qc.SMALL_INTESTINE.MD.gz.
## 2023-05-02 18:13:27 downloading and reading the feature information.
## 2023-05-02 18:13:28 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:13:28 downloading and reading the counts: 193 samples across 63856 features.
## 2023-05-02 18:13:29 caching file gtex.gene_sums.SMALL_INTESTINE.G026.gz.
## 2023-05-02 18:13:29 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:13:30 downloading and reading the metadata.
## 2023-05-02 18:13:30 caching file gtex.gtex.PROSTATE.MD.gz.
## 2023-05-02 18:13:31 caching file gtex.recount_project.PROSTATE.MD.gz.
## 2023-05-02 18:13:31 caching file gtex.recount_qc.PROSTATE.MD.gz.
## 2023-05-02 18:13:32 caching file gtex.recount_seq_qc.PROSTATE.MD.gz.
## 2023-05-02 18:13:33 downloading and reading the feature information.
## 2023-05-02 18:13:33 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:13:34 downloading and reading the counts: 263 samples across 63856 features.
## 2023-05-02 18:13:34 caching file gtex.gene_sums.PROSTATE.G026.gz.
## 2023-05-02 18:13:35 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:13:35 downloading and reading the metadata.
## 2023-05-02 18:13:36 caching file gtex.gtex.TESTIS.MD.gz.
## 2023-05-02 18:13:36 caching file gtex.recount_project.TESTIS.MD.gz.
## 2023-05-02 18:13:37 caching file gtex.recount_qc.TESTIS.MD.gz.
## 2023-05-02 18:13:38 caching file gtex.recount_seq_qc.TESTIS.MD.gz.
## 2023-05-02 18:13:38 downloading and reading the feature information.
## 2023-05-02 18:13:39 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:13:39 downloading and reading the counts: 410 samples across 63856 features.
## 2023-05-02 18:13:40 caching file gtex.gene_sums.TESTIS.G026.gz.
## 2023-05-02 18:13:42 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:13:42 downloading and reading the metadata.
## 2023-05-02 18:13:43 caching file gtex.gtex.NERVE.MD.gz.
## 2023-05-02 18:13:44 caching file gtex.recount_project.NERVE.MD.gz.
## 2023-05-02 18:13:44 caching file gtex.recount_qc.NERVE.MD.gz.
## 2023-05-02 18:13:45 caching file gtex.recount_seq_qc.NERVE.MD.gz.
## 2023-05-02 18:13:46 downloading and reading the feature information.
## 2023-05-02 18:13:46 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:13:46 downloading and reading the counts: 659 samples across 63856 features.
## 2023-05-02 18:13:47 caching file gtex.gene_sums.NERVE.G026.gz.
## 2023-05-02 18:13:49 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:13:49 downloading and reading the metadata.
## 2023-05-02 18:13:49 caching file gtex.gtex.PITUITARY.MD.gz.
## 2023-05-02 18:13:50 caching file gtex.recount_project.PITUITARY.MD.gz.
## 2023-05-02 18:13:51 caching file gtex.recount_qc.PITUITARY.MD.gz.
## 2023-05-02 18:13:51 caching file gtex.recount_seq_qc.PITUITARY.MD.gz.
## 2023-05-02 18:13:52 downloading and reading the feature information.
## 2023-05-02 18:13:52 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:13:53 downloading and reading the counts: 301 samples across 63856 features.
## 2023-05-02 18:13:53 caching file gtex.gene_sums.PITUITARY.G026.gz.
## 2023-05-02 18:13:55 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:13:55 downloading and reading the metadata.
## 2023-05-02 18:13:55 caching file gtex.gtex.BLOOD.MD.gz.
## 2023-05-02 18:13:56 caching file gtex.recount_project.BLOOD.MD.gz.
## 2023-05-02 18:13:56 caching file gtex.recount_qc.BLOOD.MD.gz.
## 2023-05-02 18:13:57 caching file gtex.recount_seq_qc.BLOOD.MD.gz.
## 2023-05-02 18:13:58 downloading and reading the feature information.
## 2023-05-02 18:13:58 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:13:59 downloading and reading the counts: 1048 samples across 63856 features.
## 2023-05-02 18:13:59 caching file gtex.gene_sums.BLOOD.G026.gz.
## 2023-05-02 18:14:07 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:14:07 downloading and reading the metadata.
## 2023-05-02 18:14:07 caching file gtex.gtex.LIVER.MD.gz.
## 2023-05-02 18:14:08 caching file gtex.recount_project.LIVER.MD.gz.
## 2023-05-02 18:14:09 caching file gtex.recount_qc.LIVER.MD.gz.
## 2023-05-02 18:14:09 caching file gtex.recount_seq_qc.LIVER.MD.gz.
## 2023-05-02 18:14:10 downloading and reading the feature information.
## 2023-05-02 18:14:10 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:14:11 downloading and reading the counts: 251 samples across 63856 features.
## 2023-05-02 18:14:11 caching file gtex.gene_sums.LIVER.G026.gz.
## 2023-05-02 18:14:12 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:14:12 downloading and reading the metadata.
## 2023-05-02 18:14:13 caching file gtex.gtex.KIDNEY.MD.gz.
## 2023-05-02 18:14:13 caching file gtex.recount_project.KIDNEY.MD.gz.
## 2023-05-02 18:14:14 caching file gtex.recount_qc.KIDNEY.MD.gz.
## 2023-05-02 18:14:15 caching file gtex.recount_seq_qc.KIDNEY.MD.gz.
## 2023-05-02 18:14:15 downloading and reading the feature information.
## 2023-05-02 18:14:16 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:14:16 downloading and reading the counts: 98 samples across 63856 features.
## 2023-05-02 18:14:17 caching file gtex.gene_sums.KIDNEY.G026.gz.
## 2023-05-02 18:14:17 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:14:17 downloading and reading the metadata.
## 2023-05-02 18:14:18 caching file gtex.gtex.CERVIX_UTERI.MD.gz.
## 2023-05-02 18:14:18 caching file gtex.recount_project.CERVIX_UTERI.MD.gz.
## 2023-05-02 18:14:19 caching file gtex.recount_qc.CERVIX_UTERI.MD.gz.
## 2023-05-02 18:14:20 caching file gtex.recount_seq_qc.CERVIX_UTERI.MD.gz.
## 2023-05-02 18:14:20 downloading and reading the feature information.
## 2023-05-02 18:14:21 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:14:21 downloading and reading the counts: 19 samples across 63856 features.
## 2023-05-02 18:14:22 caching file gtex.gene_sums.CERVIX_UTERI.G026.gz.
## 2023-05-02 18:14:22 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:14:22 downloading and reading the metadata.
## 2023-05-02 18:14:23 caching file gtex.gtex.FALLOPIAN_TUBE.MD.gz.
## 2023-05-02 18:14:24 caching file gtex.recount_project.FALLOPIAN_TUBE.MD.gz.
## 2023-05-02 18:14:25 caching file gtex.recount_qc.FALLOPIAN_TUBE.MD.gz.
## 2023-05-02 18:14:26 caching file gtex.recount_seq_qc.FALLOPIAN_TUBE.MD.gz.
## 2023-05-02 18:14:26 downloading and reading the feature information.
## 2023-05-02 18:14:27 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:14:27 downloading and reading the counts: 9 samples across 63856 features.
## 2023-05-02 18:14:28 caching file gtex.gene_sums.FALLOPIAN_TUBE.G026.gz.
## 2023-05-02 18:14:29 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:14:29 downloading and reading the metadata.
## 2023-05-02 18:14:29 caching file gtex.gtex.BLADDER.MD.gz.
## 2023-05-02 18:14:30 caching file gtex.recount_project.BLADDER.MD.gz.
## 2023-05-02 18:14:31 caching file gtex.recount_qc.BLADDER.MD.gz.
## 2023-05-02 18:14:31 caching file gtex.recount_seq_qc.BLADDER.MD.gz.
## 2023-05-02 18:14:32 downloading and reading the feature information.
## 2023-05-02 18:14:32 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:14:33 downloading and reading the counts: 21 samples across 63856 features.
## 2023-05-02 18:14:33 caching file gtex.gene_sums.BLADDER.G026.gz.
## 2023-05-02 18:14:34 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:14:34 downloading and reading the metadata.
## 2023-05-02 18:14:34 caching file gtex.gtex.STUDY_NA.MD.gz.
## 2023-05-02 18:14:35 caching file gtex.recount_project.STUDY_NA.MD.gz.
## 2023-05-02 18:14:36 caching file gtex.recount_qc.STUDY_NA.MD.gz.
## 2023-05-02 18:14:36 caching file gtex.recount_seq_qc.STUDY_NA.MD.gz.
## 2023-05-02 18:14:37 downloading and reading the feature information.
## 2023-05-02 18:14:37 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:14:38 downloading and reading the counts: 133 samples across 63856 features.
## 2023-05-02 18:14:38 caching file gtex.gene_sums.STUDY_NA.G026.gz.
## 2023-05-02 18:14:39 construcing the RangedSummarizedExperiment (rse) object.
## 2023-05-02 18:14:39 downloading and reading the metadata.
## 2023-05-02 18:14:39 caching file gtex.gtex.BONE_MARROW.MD.gz.
## 2023-05-02 18:14:40 caching file gtex.recount_project.BONE_MARROW.MD.gz.
## 2023-05-02 18:14:40 caching file gtex.recount_qc.BONE_MARROW.MD.gz.
## 2023-05-02 18:14:41 caching file gtex.recount_seq_qc.BONE_MARROW.MD.gz.
## 2023-05-02 18:14:42 downloading and reading the feature information.
## 2023-05-02 18:14:42 caching file human.gene_sums.G026.gtf.gz.
## 2023-05-02 18:14:43 downloading and reading the counts: 204 samples across 63856 features.
## 2023-05-02 18:14:43 caching file gtex.gene_sums.BONE_MARROW.G026.gz.
## 2023-05-02 18:14:44 construcing the RangedSummarizedExperiment (rse) object.
name <- paste(gtex_proj_info[, 1], "rse", sep = "_")
gtex_coldata <- colData(get(name[1]))
for (i in 2:length(name)) {
coldata <- colData(get(name[i]))
gtex_coldata <- rbind(gtex_coldata, coldata)
}
look at metadata of samples
gtex_coldata <- as.data.frame(gtex_coldata)
table(is.na(gtex_coldata$gtex.age))
##
## FALSE TRUE
## 19010 204
table(gtex_coldata$gtex.age)
##
## 20-29 30-39 40-49 50-59 60-69 70-79
## 1501 1428 2998 6089 6338 656
table(is.na(gtex_coldata$gtex.sex))
##
## FALSE TRUE
## 19010 204
table(gtex_coldata$gtex.sex)
##
## 1 2
## 12568 6442
1= male 2= female
table(gtex_coldata$gtex.dthhrdy)
##
## 0 1 2 3 4
## 9700 740 5145 1020 2254
table(is.na(gtex_coldata$gtex.dthhrdy))
##
## FALSE TRUE
## 18859 355
0=Ventilator Case 1=Violent and fast death 2=Fast death of natural causes 3=Intermediate death 4=Slow death
How many sample per tissue?
as.data.frame(table(gtex_coldata$gtex.smts))
## Var1 Freq
## 1 Adipose Tissue 1293
## 2 Adrenal Gland 274
## 3 Bladder 21
## 4 Blood 1048
## 5 Blood Vessel 1398
## 6 Bone Marrow 204
## 7 Brain 2931
## 8 Breast 482
## 9 Cervix Uteri 19
## 10 Colon 822
## 11 Esophagus 1577
## 12 Fallopian Tube 9
## 13 Heart 942
## 14 Kidney 98
## 15 Liver 251
## 16 Lung 655
## 17 Muscle 881
## 18 Nerve 659
## 19 Ovary 195
## 20 Pancreas 360
## 21 Pituitary 301
## 22 Prostate 263
## 23 Salivary Gland 178
## 24 Skin 1940
## 25 Small Intestine 193
## 26 Spleen 255
## 27 Stomach 384
## 28 Testis 410
## 29 Thyroid 706
## 30 Uterus 159
## 31 Vagina 173
substructures of tissues
as.data.frame(table(gtex_coldata$gtex.smtsd))
## Var1 Freq
## 1 Adipose - Subcutaneous 731
## 2 Adipose - Visceral (Omentum) 562
## 3 Adrenal Gland 274
## 4 Artery - Aorta 452
## 5 Artery - Coronary 253
## 6 Artery - Tibial 693
## 7 Bladder 21
## 8 Brain - Amygdala 163
## 9 Brain - Anterior cingulate cortex (BA24) 201
## 10 Brain - Caudate (basal ganglia) 273
## 11 Brain - Cerebellar Hemisphere 250
## 12 Brain - Cerebellum 285
## 13 Brain - Cortex 286
## 14 Brain - Frontal Cortex (BA9) 224
## 15 Brain - Hippocampus 220
## 16 Brain - Hypothalamus 221
## 17 Brain - Nucleus accumbens (basal ganglia) 262
## 18 Brain - Putamen (basal ganglia) 221
## 19 Brain - Spinal cord (cervical c-1) 171
## 20 Brain - Substantia nigra 154
## 21 Breast - Mammary Tissue 482
## 22 Cells - Cultured fibroblasts 520
## 23 Cells - EBV-transformed lymphocytes 196
## 24 Cells - Leukemia cell line (CML) 204
## 25 Cervix - Ectocervix 9
## 26 Cervix - Endocervix 10
## 27 Colon - Sigmoid 389
## 28 Colon - Transverse 433
## 29 Esophagus - Gastroesophageal Junction 399
## 30 Esophagus - Mucosa 625
## 31 Esophagus - Muscularis 553
## 32 Fallopian Tube 9
## 33 Heart - Atrial Appendage 450
## 34 Heart - Left Ventricle 492
## 35 Kidney - Cortex 94
## 36 Kidney - Medulla 4
## 37 Liver 251
## 38 Lung 655
## 39 Minor Salivary Gland 178
## 40 Muscle - Skeletal 881
## 41 Nerve - Tibial 659
## 42 Ovary 195
## 43 Pancreas 360
## 44 Pituitary 301
## 45 Prostate 263
## 46 Skin - Not Sun Exposed (Suprapubic) 639
## 47 Skin - Sun Exposed (Lower leg) 781
## 48 Small Intestine - Terminal Ileum 193
## 49 Spleen 255
## 50 Stomach 384
## 51 Testis 410
## 52 Thyroid 706
## 53 Uterus 159
## 54 Vagina 173
## 55 Whole Blood 852
adjust the data labels
gtex_coldata$gtex.sex <- factor(ifelse(gtex_coldata$gtex.sex == 1, "M", "F"),
levels = c("M", "F")
)
gtex_coldata$gtex.dthhrdy <- ifelse(gtex_coldata$gtex.dthhrdy == 0,
"Ventilator_Case",
ifelse(gtex_coldata$gtex.dthhrdy == 1,
"Violent_and_Fast_Death",
ifelse(
gtex_coldata$gtex.dthhrdy == 2,
"Fast_Death_of_Natural_Causes",
ifelse(
gtex_coldata$gtex.dthhrdy == 3,
"Intermediate_Death",
ifelse(
gtex_coldata$gtex.dthhrdy == 4,
"Slow_Death",
"Not Reported"
)
)
)
)
)
look at age and sex
gtex_coldata$SEX_AGE <- paste(gtex_coldata$gtex.sex,
gtex_coldata$gtex.age,
sep = "_"
)
look at age, sex, and type of death
gtex_coldata$SEX_AGE_DEATH <- paste(gtex_coldata$SEX_AGE,
gtex_coldata$gtex.dthhrdy,
sep = "_"
)
table(gtex_coldata$SEX_AGE_DEATH)
##
## F_20-29_Ventilator_Case F_20-29_Violent_and_Fast_Death
## 465 90
## F_30-39_Fast_Death_of_Natural_Causes F_30-39_Slow_Death
## 24 37
## F_30-39_Ventilator_Case F_30-39_Violent_and_Fast_Death
## 359 15
## F_40-49_Fast_Death_of_Natural_Causes F_40-49_Intermediate_Death
## 93 19
## F_40-49_NA F_40-49_Slow_Death
## 32 77
## F_40-49_Ventilator_Case F_40-49_Violent_and_Fast_Death
## 991 82
## F_50-59_Fast_Death_of_Natural_Causes F_50-59_Intermediate_Death
## 401 143
## F_50-59_NA F_50-59_Slow_Death
## 28 160
## F_50-59_Ventilator_Case F_50-59_Violent_and_Fast_Death
## 1077 65
## F_60-69_Fast_Death_of_Natural_Causes F_60-69_Intermediate_Death
## 492 162
## F_60-69_Slow_Death F_60-69_Ventilator_Case
## 560 862
## F_60-69_Violent_and_Fast_Death F_70-79_Fast_Death_of_Natural_Causes
## 19 54
## F_70-79_Intermediate_Death F_70-79_NA
## 11 7
## F_70-79_Slow_Death F_70-79_Ventilator_Case
## 59 58
## M_20-29_Fast_Death_of_Natural_Causes M_20-29_Intermediate_Death
## 59 50
## M_20-29_Ventilator_Case M_20-29_Violent_and_Fast_Death
## 747 90
## M_30-39_Fast_Death_of_Natural_Causes M_30-39_NA
## 81 13
## M_30-39_Slow_Death M_30-39_Ventilator_Case
## 61 721
## M_30-39_Violent_and_Fast_Death M_40-49_Fast_Death_of_Natural_Causes
## 117 372
## M_40-49_Intermediate_Death M_40-49_NA
## 15 5
## M_40-49_Slow_Death M_40-49_Ventilator_Case
## 170 1098
## M_40-49_Violent_and_Fast_Death M_50-59_Fast_Death_of_Natural_Causes
## 44 1586
## M_50-59_Intermediate_Death M_50-59_NA
## 95 24
## M_50-59_Slow_Death M_50-59_Ventilator_Case
## 325 2100
## M_50-59_Violent_and_Fast_Death M_60-69_Fast_Death_of_Natural_Causes
## 85 1843
## M_60-69_Intermediate_Death M_60-69_NA
## 495 39
## M_60-69_Slow_Death M_60-69_Ventilator_Case
## 700 1093
## M_60-69_Violent_and_Fast_Death M_70-79_Fast_Death_of_Natural_Causes
## 73 140
## M_70-79_Intermediate_Death M_70-79_NA
## 30 3
## M_70-79_Slow_Death M_70-79_Ventilator_Case
## 105 129
## M_70-79_Violent_and_Fast_Death NA_NA_NA
## 60 204
Remove one sample (GTEX-11 ILO) that was identified in previous literature to be an individual who completed a sex change (Paulson et al. 2017). We also focused on samples from the TRUSeq.v1 chemistry.
gtex_coldata2 <- gtex_coldata[!is.na(gtex_coldata$gtex.smgebtcht), ]
gtex_coldata3 <- gtex_coldata2[gtex_coldata2$gtex.smgebtcht == "TruSeq.v1", ]
gtex_coldata3 <- gtex_coldata3[!is.na(gtex_coldata3$gtex.subjid), ]
gtex_coldata4 <- gtex_coldata3[!gtex_coldata3$gtex.subjid == "GTEX-11ILO", ]
gtex_coldata <- gtex_coldata4
# for each tissue count each
# i and J loops
tissue <- unique(gtex_coldata$gtex.smts)
sub_tissues <- unique(gtex_coldata$gtex.smtsd)
tissue_data <- paste(tissue, "data", sep = "_")
# Sex
sex_table <- as.data.frame(table(gtex_coldata[c(14, 6)]))
# Age
age_table <- as.data.frame(table(gtex_coldata[c(14, 7)]))
# Death
death_table <- as.data.frame(table(gtex_coldata[c(14, 8)]))
# AGE_Sex
age_sex_table <- as.data.frame(table(gtex_coldata[c(14, 199)]))
age_sex_table_v2 <- as.data.frame(table(gtex_coldata[c(14, 6, 7)]))
# AGE_Sex_Death
age_sex_death_table <- as.data.frame(table(gtex_coldata[c(14, 200)]))
age_sex_death_table_v2 <- as.data.frame(table(gtex_coldata[c(14, 6, 7, 8)]))
plot the number of samples for each sex across the tissues
ggplot(sex_table, aes(fill = gtex.sex, y = Freq, x = gtex.smts)) +
geom_bar(position = "stack", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Count") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Sex for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_sex_tissue_GTEx.pdf")
## Saving 7 x 5 in image
plot the number of samples for each sex across the tissues(with the percentage)
ggplot(sex_table, aes(fill = gtex.sex, y = Freq, x = gtex.smts)) +
geom_bar(position = "fill", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Fraction") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Sex for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_sex_tissue_GTEx_percent.pdf")
## Saving 7 x 5 in image
plot the number of samples for each age groups across the tissues
ggplot(age_table, aes(fill = gtex.age, y = Freq, x = gtex.smts)) +
geom_bar(position = "stack", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Count") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Age for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_age_tissue_GTEx.pdf")
## Saving 7 x 5 in image
plot the number of samples for each age group across the tissues (percentage)
ggplot(age_table, aes(fill = gtex.age, y = Freq, x = gtex.smts)) +
geom_bar(position = "fill", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Fraction") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Age for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_age_tissue_GTEx_percent.pdf")
## Saving 7 x 5 in image
plot the number of samples for each sex/age groups across the tissues
ggplot(age_sex_table, aes(fill = SEX_AGE, y = Freq, x = gtex.smts)) +
geom_bar(position = "stack", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Count") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Sex and Age for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_sex_age_tissue_GTEx.pdf")
## Saving 7 x 5 in image
plot the number of samples for each sex/ages across the tissues (percentage)
ggplot(age_sex_table, aes(fill = SEX_AGE, y = Freq, x = gtex.smts)) +
geom_bar(position = "fill", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Fraction") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Sex and Age for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_sex_age_tissue_GTEx_percent.pdf")
## Saving 7 x 5 in image
plot the number of samples for each type of death across the tissues
ggplot(death_table, aes(fill = gtex.dthhrdy, y = Freq, x = gtex.smts)) +
geom_bar(position = "fill", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Fraction") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Type of Death for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_death_tissue_GTEx_percent.pdf")
## Saving 7 x 5 in image
what types of tissues do I have?
tissue
## [1] "Adipose Tissue" "Muscle" "Blood Vessel" "Heart"
## [5] "Ovary" "Uterus" "Vagina" "Breast"
## [9] "Skin" "Salivary Gland" "Brain" "Adrenal Gland"
## [13] "Thyroid" "Lung" "Spleen" "Pancreas"
## [17] "Esophagus" "Stomach" "Colon" "Small Intestine"
## [21] "Prostate" "Testis" "Nerve" "Pituitary"
## [25] "Blood" "Liver" "Kidney" "Cervix Uteri"
## [29] "Fallopian Tube" "Bladder"
for each tissue plot the sex and age
for (i in seq_along(tissue)) {
data <- age_sex_table_v2[age_sex_table_v2$gtex.smts == tissue[i], ]
title <- paste("Sex and Age for GTEx", tissue[i])
plot <- ggplot(data = data, aes(
x = gtex.age,
y = Freq, group = gtex.sex
)) +
geom_line(aes(linetype = gtex.sex)) +
geom_point() +
theme_classic() +
ggtitle(title)
print(plot)
nfile <- paste("~/results/GTEx_plots/210806_age_sex_GTEx_",
tissue[i], ".pdf",
sep = ""
)
ggsave(nfile)
}
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Saving 7 x 5 in image
combine the results together
ggplot(data = age_sex_table_v2, aes(
x = gtex.age,
y = Freq,
group = interaction(
gtex.sex,
gtex.smts
)
)) +
geom_line(aes(color = interaction(gtex.sex, gtex.smts))) +
geom_point() +
theme_classic() +
ggtitle("Sex and Age GTEx Tissues")
ggsave("~/results/GTEx_plots/210806_sex_age_all_tissue_GTEx.pdf")
## Saving 7 x 5 in image
Look at the sub tissues
# Sex
sex_sub_table <- as.data.frame(table(gtex_coldata[c(15, 6)]))
# Age
age_sub_table <- as.data.frame(table(gtex_coldata[c(15, 7)]))
# Death
death_sub_table <- as.data.frame(table(gtex_coldata[c(15, 8)]))
# AGE_Sex
age_sex_sub_table <- as.data.frame(table(gtex_coldata[c(15, 199)]))
age_sex_sub_table_v2 <- as.data.frame(table(gtex_coldata[c(15, 6, 7)]))
# AGE_Sex_Death
age_sex_death_sub_table <- as.data.frame(table(gtex_coldata[c(15, 200)]))
age_sex_death_sub_table_v2 <- as.data.frame(table(
gtex_coldata[c(15, 6, 7, 8)]
))
Plot the number of samples for each sex across the sub-tissue groups
ggplot(sex_sub_table, aes(fill = gtex.sex, y = Freq, x = gtex.smtsd)) +
geom_bar(position = "stack", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Count") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Sex for GTEx tissue")
ggsave("~/results/210806_sex_tissue_structures_GTEx.pdf")
## Saving 7 x 5 in image
Plot the number of samples for each sex across the sub-tissue groups
ggplot(sex_sub_table, aes(fill = gtex.sex, y = Freq, x = gtex.smtsd)) +
geom_bar(position = "fill", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Fraction") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Sex for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_sex_tissue_structures_GTEx_percent.pdf")
## Saving 7 x 5 in image
Plot the number of samples for each age group across the sub-tissue groups
ggplot(age_sub_table, aes(fill = gtex.age, y = Freq, x = gtex.smtsd)) +
geom_bar(position = "stack", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Count") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Age for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_age_tissue_structures_GTEx.pdf")
## Saving 7 x 5 in image
Plot the number of samples for each sex across the sub-tissue groups
ggplot(age_sub_table, aes(fill = gtex.age, y = Freq, x = gtex.smtsd)) +
geom_bar(position = "fill", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Fraction") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Age for GTEx tissue")
ggsave("~/results/210806_age_tissue_structures_GTEx.pdf")
## Saving 7 x 5 in image
Plot the number of samples for each sex/age across the sub-tissue groups
ggplot(age_sex_sub_table, aes(fill = SEX_AGE, y = Freq, x = gtex.smtsd)) +
geom_bar(position = "stack", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Count") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Sex and Age for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_sex_age_tissue_structures_GTEx.pdf")
## Saving 7 x 5 in image
Plot the number of samples for each sex/age across the sub-tissue groups
ggplot(age_sex_sub_table, aes(fill = SEX_AGE, y = Freq, x = gtex.smtsd)) +
geom_bar(position = "fill", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Fraction") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Sex and Age for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_sex_age_tissue_structures_GTEx_percent.pdf")
## Saving 7 x 5 in image
Plot the number of samples for each type of death across the sub-tissue groups
ggplot(death_sub_table, aes(fill = gtex.dthhrdy, y = Freq, x = gtex.smtsd)) +
geom_bar(position = "fill", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Fraction") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Type of Death for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_death_tissue_structures_GTEx_percent.pdf")
## Saving 7 x 5 in image
Plot the RIN score for each sex across the sub-tissue groups
ggplot(gtex_coldata, aes(x = gtex.smts, y = gtex.smrin, fill = gtex.sex)) +
geom_boxplot(position = position_dodge(1)) +
theme_classic() +
xlab("Tissue") +
ylab("RIN Score") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("RIN for GTEx tissue") +
geom_hline(yintercept = 5, linetype = "dashed", color = "red")
ggsave("~/results/210806_RIN_tissue_GTEx.pdf")
## Saving 7 x 5 in image
Plot the autolysis score.
The autolysis score was assigned by a pathologist during a visual inspection of the histology image. The assigned values ranged from 0 to 3 (None, Mild, Moderate, and Severe).
autolysis_score <- as.data.frame(table(gtex_coldata[c(14, 10)]))
ggplot(autolysis_score, aes(fill = gtex.smatsscr, y = Freq, x = gtex.smts)) +
geom_bar(position = "fill", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Fraction") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Autolysis Score for GTEx tissue")
## Warning: Removed 4 rows containing missing values (`geom_bar()`).
ggsave("~/results/GTEx_plots/210806_autolysis_score_tissue_GTEx_percent.pdf")
## Saving 7 x 5 in image
## Warning: Removed 4 rows containing missing values (`geom_bar()`).
Plot the total number of reads aligned/map
ggplot(gtex_coldata, aes(x = gtex.smts, y = gtex.smmppd, fill = gtex.sex)) +
geom_boxplot(position = position_dodge(1)) +
theme_classic() +
xlab("Tissue") +
ylab("Total number of reads aligned/map") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Total number of reads aligned/map for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_ALIGNED_tissue_GTEx.pdf")
## Saving 7 x 5 in image
Plot the ischemic time for the samples
ggplot(gtex_coldata, aes(x = gtex.smts, y = gtex.smtsisch, fill = gtex.sex)) +
geom_boxplot(position = position_dodge(1)) +
theme_classic() +
xlab("Tissue") +
ylab("Ischemic Time") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Ischemic Time for GTEx tissue")
## Warning: Removed 34 rows containing non-finite values (`stat_boxplot()`).
ggsave("~/results/GTEx_plots/210806_Ischemic_Time_tissue_GTEx.pdf")
## Saving 7 x 5 in image
## Warning: Removed 34 rows containing non-finite values (`stat_boxplot()`).
Plot the mapping rate for the samples
ggplot(gtex_coldata, aes(x = gtex.smts, y = gtex.smmaprt, fill = gtex.sex)) +
geom_boxplot(position = position_dodge(1)) +
theme_classic() +
xlab("Tissue") +
ylab("Mapping Rate") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Mapping Rate for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_mapping_rate_tissue_GTEx.pdf")
## Saving 7 x 5 in image
look at the sub tissue groups
Plot the RIn score
High-quality RNA will contain an RIN of at least 8, where partially fragmented RNA will contain an RIN within the range of 6–8. Any RNA sample that has a RIN below 5 should not be subjected to further fragmentation during the ScriptSeq protocol, as it will generate smaller than desired fragments.
ggplot(gtex_coldata, aes(x = gtex.smtsd, y = gtex.smrin, fill = gtex.sex)) +
geom_boxplot(position = position_dodge(1)) +
theme_classic() +
xlab("Tissue") +
ylab("RIN Score") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("RIN for GTEx tissue") +
geom_hline(yintercept = 5, linetype = "dashed", color = "red")
ggsave("~/results/GTEx_plots/210806_RIN_tissue_structures_GTEx.pdf")
## Saving 7 x 5 in image
Plot the autolysis score
autolysis_score <- as.data.frame(table(gtex_coldata[c(15, 10)]))
ggplot(autolysis_score, aes(fill = gtex.smatsscr, y = Freq, x = gtex.smtsd)) +
geom_bar(position = "fill", stat = "identity") +
theme_classic() +
xlab("Tissue") +
ylab("Fraction") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Autolysis Score for GTEx tissue")
## Warning: Removed 56 rows containing missing values (`geom_bar()`).
ggsave(
"~/results/GTEx_plots/210806_autolysis_score_tissue_structure_GTEx_percent.pdf"
)
## Saving 7 x 5 in image
## Warning: Removed 56 rows containing missing values (`geom_bar()`).
Plot the number of reads aligned
ggplot(gtex_coldata, aes(x = gtex.smtsd, y = gtex.smmppd, fill = gtex.sex)) +
geom_boxplot(position = position_dodge(1)) +
theme_classic() +
xlab("Tissue") +
ylab("Total number of reads aligned/map") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Total number of reads aligned/map for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_aligned_tissue_structures_GTEx.pdf")
## Saving 7 x 5 in image
plot the ischemic time Interval between actual death, presumed death, or cross clamp application and final tissue stabilization
ggplot(gtex_coldata, aes(x = gtex.smtsd, y = gtex.smtsisch, fill = gtex.sex)) +
geom_boxplot(position = position_dodge(1)) +
theme_classic() +
xlab("Tissue") +
ylab("Ischemic Time (mins)") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Ischemic Time for GTEx tissue")
## Warning: Removed 34 rows containing non-finite values (`stat_boxplot()`).
ggsave("~/results/GTEx_plots/210806_ischemic_time_tissue_structures_GTEx.pdf")
## Saving 7 x 5 in image
## Warning: Removed 34 rows containing non-finite values (`stat_boxplot()`).
Plot the Mapping Rate: Ratio of total mapped reads to total reads
ggplot(gtex_coldata, aes(x = gtex.smtsd, y = gtex.smmaprt, fill = gtex.sex)) +
geom_boxplot(position = position_dodge(1)) +
theme_classic() +
xlab("Tissue") +
ylab("Mapping Rate") +
scale_fill_viridis(discrete = TRUE) +
theme(
axis.text.x = element_text(angle = 90, size = 3.5),
axis.ticks = element_blank()
) +
ggtitle("Mapping Rate for GTEx tissue")
ggsave("~/results/GTEx_plots/210806_mapping_rate_tissue_structures_GTEx.pdf")
## Saving 7 x 5 in image
Look at all the tissue/sex that have less than 10 samples
sex_sub_table[sex_sub_table$Freq < 10, ]
## gtex.smtsd gtex.sex Freq
## 24 Cervix - Ectocervix M 0
## 25 Cervix - Endocervix M 0
## 31 Fallopian Tube M 0
## 35 Kidney - Medulla M 3
## 41 Ovary M 0
## 52 Uterus M 0
## 53 Vagina M 0
## 61 Bladder F 7
## 78 Cervix - Ectocervix F 9
## 85 Fallopian Tube F 9
## 89 Kidney - Medulla F 1
## 98 Prostate F 0
## 104 Testis F 0
unique(sex_sub_table$gtex.smtsd[sex_sub_table$Freq < 10])
## [1] Cervix - Ectocervix Cervix - Endocervix Fallopian Tube
## [4] Kidney - Medulla Ovary Uterus
## [7] Vagina Bladder Prostate
## [10] Testis
## 54 Levels: Adipose - Subcutaneous ... Whole Blood
remove all the tissues with less than 10, but bladder
aka. sex-specific tissue or low numbers of samples of one sex: cervix-ectocervix, cervix-endocervix, fallopian tube, kidney-medulla, ovary, uterus, vagina, prostate, and testis.
# keep bladder aka number 8
remove_tissues <- unique(sex_sub_table$gtex.smtsd[sex_sub_table$Freq < 10])[-8]
I also removed samples with a RIN score less than or equal to 5
gtex_coldata_v2 <- gtex_coldata[!gtex_coldata$gtex.smtsd %in% remove_tissues, ]
gtex_coldata_v2 <- gtex_coldata_v2[gtex_coldata_v2$gtex.smrin > 5, ]
as.data.frame(table(gtex_coldata_v2[c(15, 6)]))
## gtex.smtsd gtex.sex Freq
## 1 Adipose - Subcutaneous M 475
## 2 Adipose - Visceral (Omentum) M 383
## 3 Adrenal Gland M 168
## 4 Artery - Aorta M 292
## 5 Artery - Coronary M 152
## 6 Artery - Tibial M 466
## 7 Bladder M 14
## 8 Brain - Amygdala M 115
## 9 Brain - Anterior cingulate cortex (BA24) M 145
## 10 Brain - Caudate (basal ganglia) M 197
## 11 Brain - Cerebellar Hemisphere M 177
## 12 Brain - Cerebellum M 195
## 13 Brain - Cortex M 199
## 14 Brain - Frontal Cortex (BA9) M 158
## 15 Brain - Hippocampus M 156
## 16 Brain - Hypothalamus M 161
## 17 Brain - Nucleus accumbens (basal ganglia) M 192
## 18 Brain - Putamen (basal ganglia) M 167
## 19 Brain - Spinal cord (cervical c-1) M 107
## 20 Brain - Substantia nigra M 111
## 21 Breast - Mammary Tissue M 302
## 22 Cells - Cultured fibroblasts M 338
## 23 Cells - EBV-transformed lymphocytes M 119
## 24 Colon - Sigmoid M 251
## 25 Colon - Transverse M 277
## 26 Esophagus - Gastroesophageal Junction M 267
## 27 Esophagus - Mucosa M 407
## 28 Esophagus - Muscularis M 359
## 29 Heart - Atrial Appendage M 306
## 30 Heart - Left Ventricle M 335
## 31 Kidney - Cortex M 73
## 32 Liver M 176
## 33 Lung M 434
## 34 Minor Salivary Gland M 127
## 35 Muscle - Skeletal M 584
## 36 Nerve - Tibial M 444
## 37 Pancreas M 223
## 38 Pituitary M 217
## 39 Skin - Not Sun Exposed (Suprapubic) M 432
## 40 Skin - Sun Exposed (Lower leg) M 509
## 41 Small Intestine - Terminal Ileum M 123
## 42 Spleen M 161
## 43 Stomach M 240
## 44 Thyroid M 463
## 45 Whole Blood M 549
## 46 Adipose - Subcutaneous F 249
## 47 Adipose - Visceral (Omentum) F 179
## 48 Adrenal Gland F 106
## 49 Artery - Aorta F 160
## 50 Artery - Coronary F 100
## 51 Artery - Tibial F 220
## 52 Bladder F 7
## 53 Brain - Amygdala F 47
## 54 Brain - Anterior cingulate cortex (BA24) F 56
## 55 Brain - Caudate (basal ganglia) F 76
## 56 Brain - Cerebellar Hemisphere F 69
## 57 Brain - Cerebellum F 84
## 58 Brain - Cortex F 83
## 59 Brain - Frontal Cortex (BA9) F 63
## 60 Brain - Hippocampus F 64
## 61 Brain - Hypothalamus F 60
## 62 Brain - Nucleus accumbens (basal ganglia) F 70
## 63 Brain - Putamen (basal ganglia) F 54
## 64 Brain - Spinal cord (cervical c-1) F 64
## 65 Brain - Substantia nigra F 43
## 66 Breast - Mammary Tissue F 179
## 67 Cells - Cultured fibroblasts F 181
## 68 Cells - EBV-transformed lymphocytes F 70
## 69 Colon - Sigmoid F 138
## 70 Colon - Transverse F 155
## 71 Esophagus - Gastroesophageal Junction F 132
## 72 Esophagus - Mucosa F 217
## 73 Esophagus - Muscularis F 194
## 74 Heart - Atrial Appendage F 143
## 75 Heart - Left Ventricle F 153
## 76 Kidney - Cortex F 21
## 77 Liver F 75
## 78 Lung F 206
## 79 Minor Salivary Gland F 51
## 80 Muscle - Skeletal F 291
## 81 Nerve - Tibial F 214
## 82 Pancreas F 132
## 83 Pituitary F 84
## 84 Skin - Not Sun Exposed (Suprapubic) F 205
## 85 Skin - Sun Exposed (Lower leg) F 266
## 86 Small Intestine - Terminal Ileum F 70
## 87 Spleen F 93
## 88 Stomach F 144
## 89 Thyroid F 237
## 90 Whole Blood F 291
comibne the count and metadata data together of the filtered samples
name <- paste(gtex_proj_info[, 1], "rse", sep = "_")
gtex_counts <- as.data.frame(assay(get(name[1])))
for (i in 2:length(name)) {
counts <- as.data.frame(assay(get(name[i])))
gtex_counts <- cbind(gtex_counts, counts)
}
ids <- gtex_coldata_v2$external_id
gtex_counts_v2 <- gtex_counts[, colnames(gtex_counts) %in% ids]
dim(gtex_counts_v2)
## [1] 63856 17542
dim(gtex_coldata_v2)
## [1] 17542 200
setdiff(gtex_coldata_v2$external_id, colnames(gtex_counts_v2))
## character(0)
gtex_coldata_v2$external_id[duplicated(gtex_coldata_v2$external_id)]
## character(0)
saveRDS(gtex_coldata_v2, "~/data/metadata_gtex_filter_samples.rds")
saveRDS(gtex_counts_v2, "~/data/counts_gtex_filter_samples.rds")
table(gtex_coldata_v2$gtex.smnabtcht)
##
## RNA Extraction from Paxgene-derived Lysate Plate Based
## 10737
## RNA isolation_PAXgene Blood RNA (Manual)
## 840
## RNA isolation_PAXgene Tissue miRNA
## 5257
## RNA isolation_Trizol Manual (Cell Pellet)
## 708
covariates to look at it in PCA: ischemic time, RIN, age, and batch in just liver tissue
unique(gtex_coldata_v2$gtex.smtsd)
## [1] "Adipose - Subcutaneous"
## [2] "Adipose - Visceral (Omentum)"
## [3] "Muscle - Skeletal"
## [4] "Artery - Tibial"
## [5] "Artery - Aorta"
## [6] "Artery - Coronary"
## [7] "Heart - Atrial Appendage"
## [8] "Heart - Left Ventricle"
## [9] "Breast - Mammary Tissue"
## [10] "Cells - Cultured fibroblasts"
## [11] "Skin - Sun Exposed (Lower leg)"
## [12] "Skin - Not Sun Exposed (Suprapubic)"
## [13] "Minor Salivary Gland"
## [14] "Brain - Hippocampus"
## [15] "Brain - Cortex"
## [16] "Brain - Putamen (basal ganglia)"
## [17] "Brain - Anterior cingulate cortex (BA24)"
## [18] "Brain - Cerebellar Hemisphere"
## [19] "Brain - Frontal Cortex (BA9)"
## [20] "Brain - Spinal cord (cervical c-1)"
## [21] "Brain - Substantia nigra"
## [22] "Brain - Nucleus accumbens (basal ganglia)"
## [23] "Brain - Hypothalamus"
## [24] "Brain - Cerebellum"
## [25] "Brain - Caudate (basal ganglia)"
## [26] "Brain - Amygdala"
## [27] "Adrenal Gland"
## [28] "Thyroid"
## [29] "Lung"
## [30] "Spleen"
## [31] "Pancreas"
## [32] "Esophagus - Muscularis"
## [33] "Esophagus - Mucosa"
## [34] "Esophagus - Gastroesophageal Junction"
## [35] "Stomach"
## [36] "Colon - Transverse"
## [37] "Colon - Sigmoid"
## [38] "Small Intestine - Terminal Ileum"
## [39] "Nerve - Tibial"
## [40] "Pituitary"
## [41] "Whole Blood"
## [42] "Cells - EBV-transformed lymphocytes"
## [43] "Liver"
## [44] "Kidney - Cortex"
## [45] "Bladder"
ids <- gtex_coldata_v2$external_id[gtex_coldata_v2$gtex.smtsd == "Liver"]
col_data_sub <- gtex_coldata_v2[gtex_coldata_v2$gtex.smtsd == "Liver", ]
counts_sub <- gtex_counts_v2[, colnames(gtex_counts_v2) %in% ids]
# One sample was removed because it has an odd number of gene counts for one gene. We did not remove this sample in the overall study but removed it for the vst calculation
counts_sub <- counts_sub [,!colnames(counts_sub ) == "GTEX-WK11-1326-SM-4OOSI.1"]
vst_sub <- vst(as.matrix(counts_sub))
## converting counts to integer mode
pca_sub <- prcomp(t(vst_sub))
the lighter the color the higher the value.
fun <- colorRamp(c("black", "#FDE725FF"))
mpg <- with(
col_data_sub,
(gtex.smrin - min(gtex.smrin)) / diff(range(gtex.smrin))
)
mycolors <- rgb(fun(mpg), maxColorValue = 256)
par(cex = 1.0, cex.axis = 0.8, cex.main = 0.8)
pairs(pca_sub$x[, 1:5], col = mycolors,
main = "Principal components analysis bi-plot\nPCs 1-5",
pch = 16)
mpg <- with(col_data_sub,
(gtex.smtsisch - min(gtex.smtsisch)) / diff(range(gtex.smtsisch)))
mycolors <- rgb(fun(mpg), maxColorValue = 256)
par(cex = 1.0, cex.axis = 0.8, cex.main = 0.8)
pairs(pca_sub$x[, 1:5], col = mycolors,
main = "Principal components analysis bi-plot\nPCs 1-5",
pch = 16)
cor(col_data_sub$gtex.smtsisch, col_data_sub$gtex.smrin, method = "spearman")
## [1] -0.5652864
somewhat of a relationship but not strong.
age <- col_data_sub$gtex.age
age <- ifelse(age == "20-29", "#440154FF", age)
age <- ifelse(age == "30-39", "#414487FF", age)
age <- ifelse(age == "40-49", "#2A788EFF", age)
age <- ifelse(age == "50-59", "#22A884FF", age)
age <- ifelse(age == "60-69", "#7AD151FF", age)
age <- ifelse(age == "70-79", "#FDE725FF", age)
par(cex = 1.0, cex.axis = 0.8, cex.main = 0.8)
pairs(pca_sub$x[, 1:5], col = age,
main = "Principal components analysis bi-plot\nPCs 1-5", pch = 16)
some grouping by age but it is mostly RIN and Ischemic time; using all
the investigated covariates for downstream models if possible
#plots for age and batches
*Worried bout the kidney low RIN and high autolysis scores
*worried about about the quality of the alignment of whole blood
NA
Done in script
Location of final scripts:
"/home/rstudio/script"
Location of data produced:
"/home/rstudio/results/GTEx_plots/"
Dates when operations were done:
210810 and again on 220803 for project
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] DESeq2_1.38.2 RColorBrewer_1.1-3
## [3] viridis_0.6.2 viridisLite_0.4.1
## [5] forcats_0.5.2 stringr_1.5.0
## [7] purrr_1.0.0 readr_2.1.3
## [9] tidyr_1.2.1 tibble_3.1.8
## [11] ggplot2_3.4.0 tidyverse_1.3.2
## [13] dbplyr_2.2.1 dplyr_1.0.10
## [15] rlang_1.0.6 recount3_1.8.0
## [17] SummarizedExperiment_1.28.0 Biobase_2.58.0
## [19] GenomicRanges_1.50.2 GenomeInfoDb_1.34.4
## [21] IRanges_2.32.0 S4Vectors_0.36.1
## [23] BiocGenerics_0.44.0 MatrixGenerics_1.10.0
## [25] matrixStats_0.63.0
##
## loaded via a namespace (and not attached):
## [1] googledrive_2.0.0 colorspace_2.0-3 rjson_0.2.21
## [4] ellipsis_0.3.2 XVector_0.38.0 fs_1.5.2
## [7] rstudioapi_0.14 farver_2.1.1 bit64_4.0.5
## [10] AnnotationDbi_1.60.0 fansi_1.0.3 lubridate_1.9.0
## [13] xml2_1.3.3 codetools_0.2-18 R.methodsS3_1.8.2
## [16] cachem_1.0.6 geneplotter_1.76.0 knitr_1.41
## [19] jsonlite_1.8.4 Rsamtools_2.14.0 broom_1.0.2
## [22] annotate_1.76.0 png_0.1-8 R.oo_1.25.0
## [25] compiler_4.2.2 httr_1.4.4 backports_1.4.1
## [28] assertthat_0.2.1 Matrix_1.5-1 fastmap_1.1.0
## [31] gargle_1.2.1 cli_3.5.0 htmltools_0.5.4
## [34] tools_4.2.2 gtable_0.3.1 glue_1.6.2
## [37] GenomeInfoDbData_1.2.9 rappdirs_0.3.3 Rcpp_1.0.9
## [40] cellranger_1.1.0 jquerylib_0.1.4 vctrs_0.5.1
## [43] Biostrings_2.66.0 rtracklayer_1.58.0 xfun_0.36
## [46] rvest_1.0.3 timechange_0.1.1 lifecycle_1.0.3
## [49] restfulr_0.0.15 XML_3.99-0.13 googlesheets4_1.0.1
## [52] zlibbioc_1.44.0 scales_1.2.1 ragg_1.2.4
## [55] hms_1.1.2 parallel_4.2.2 yaml_2.3.6
## [58] curl_4.3.3 memoise_2.0.1 gridExtra_2.3
## [61] sass_0.4.4 stringi_1.7.8 RSQLite_2.2.20
## [64] highr_0.10 BiocIO_1.8.0 filelock_1.0.2
## [67] BiocParallel_1.32.5 systemfonts_1.0.4 pkgconfig_2.0.3
## [70] bitops_1.0-7 evaluate_0.19 lattice_0.20-45
## [73] labeling_0.4.2 GenomicAlignments_1.34.0 bit_4.0.5
## [76] tidyselect_1.2.0 magrittr_2.0.3 R6_2.5.1
## [79] generics_0.1.3 DelayedArray_0.24.0 DBI_1.1.3
## [82] pillar_1.8.1 haven_2.5.1 withr_2.5.0
## [85] KEGGREST_1.38.0 RCurl_1.98-1.9 modelr_0.1.10
## [88] crayon_1.5.2 utf8_1.2.2 BiocFileCache_2.6.0
## [91] tzdb_0.3.0 rmarkdown_2.19 locfit_1.5-9.7
## [94] grid_4.2.2 readxl_1.4.1 data.table_1.14.6
## [97] blob_1.2.3 reprex_2.0.2 digest_0.6.31
## [100] xtable_1.8-4 textshaping_0.3.6 R.utils_2.12.2
## [103] munsell_0.5.0 bslib_0.4.2 sessioninfo_1.2.2